A Comparison of String Distance Metrics on Usernames for Cross-Platform Identification

ثبت نشده

چکیده

People often use similar usernames across different social media sites. This fact can be used to correlate accounts between different platforms. Since the first mention of this fact in 2009 no research has been done on how to exploit it most efficiently. We showed that ignoring the casing will most definitely improve the matching and we found that Smith-Waterman provides the best metric to match usernames and achieves a success rate of 76%. This implies that earlier work using other string matching metrics could achieve better results by using Smith-Waterman.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of String Distance Metrics for Name-Matching Tasks

Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid s...

متن کامل

A Comparison of String Metrics for Matching Names and Records

We describe an open-source Java toolkit of methods for matching names and records. We summarize results obtained from using various string distance metrics on the task of matching entity names. These metrics include distance functions proposed by several different communities, such as edit-distance metrics, fast heuristic string comparators, token-based distance metrics, and hybrid methods. We ...

متن کامل

Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization

Edit distance metrics are widely used for many applications such as string comparison and spelling error corrections. Hamming distance is a metric for two equal length strings and Damerau-Levenshtein distance is a well-known metrics for making spelling corrections through string-to-string comparison. Previous distance metrics seems to be appropriate for alphabetic languages like English and Eur...

متن کامل

Usability of String Distance Metrics for Name Matching Tasks in Polish

This paper presents results of the numerous experiments on usability of well-established string distance metrics and some new variants thereof for various name matching tasks in Polish.

متن کامل

FBK-HLT: An Effective System for Paraphrase Identification and Semantic Similarity in Twitter

This paper reports the description and performance of our system, FBK-HLT, participating in the SemEval 2015, Task #1 "Paraphrase and Semantic Similarity in Twitter", for both subtasks. We submitted two runs with different classifiers in combining typical features (lexical similarity, string similarity, word n-grams, etc) with machine translation metrics and edit distance features. We outperfor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

A Comparison of String Distance Metrics on Usernames for Cross-Platform Identification

ثبت نشده

چکیده

منابع مشابه

A Comparison of String Distance Metrics for Name-Matching Tasks

A Comparison of String Metrics for Matching Names and Records

Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization

Usability of String Distance Metrics for Name Matching Tasks in Polish

FBK-HLT: An Effective System for Paraphrase Identification and Semantic Similarity in Twitter

عنوان ژورنال:

اشتراک گذاری